SMT Approaches for Commercial Translation of Subtitles

نویسندگان

Thierry Etchegoyhen

Mark Fishel

Jie Jiang

Mirjam Sepesy Maučec

چکیده

In this presentation, we report experiments on developing statistical machine translation (SMT) systems of practical use for the professional translation of subtitles. We present results of several methods that were tested for this task, describing both positive and negative outcomes. We believe these results to be of interest for companies considering the integration of SMT in multilingual commercial systems, and researchers interested in the use of current methods for large-scale SMT systems development in a specific domain. The work we describe is part of the SUMAT project, funded through the EU ICT Policy Support Programme (20112014), whose goal is to produce machine translation systems for film and TV subtitles for seven language pairs. Nine partners are involved in the project: four subtitle companies (Deluxe Digital Studios, InVision, Titelbild, Voice & Script International) and five technical partners (Athens Technology Center, CapitaTI, TextShuttle, University of Maribor and Vicomtech). In order to integrate SMT systems into a commercially viable translation workflow, it is vital for such systems to meet quality levels that do not hinder on the post-editing experience. Previous experiments (Bywood et al., 2012) have shown that, even in cases of increased productivity for professional translators post-editing machinetranslated output, the perception and use of the systems is negatively affected overall by output of poor quality. To overcome this issue and raise SMT quality, we explored several approaches, taking into account issues of training and decoding efficiency, as well as issues regarding the integration of data from different sources and domains. The baseline SMT phrase-based systems were trained on large numbers of translated subtitles provided by the subtitling companies (between 200,000 and 2 million subtitles per language pair), using the Moses framework (Koehn et al., 2007). To improve the baselines, two sets of experiments were performed: incorporating linguistic information (including factored models in various configurations (Koehn and Hoang, 2007), syntax-based statistical translation and decompounding), and development of larger models by combining in-domain and out-of-domain data via mixture-modeling and perplexity minimization techniques (Sennrich, 2012). Overall, the first approach provided little to no improvement over the baselines, whereas the second one proved successful at a comparatively lower cost. In this talk, we will describe the main experiments and their results, offering insight on the optimal balance between development costs and the requirement for better systems accuracy in professional applications. Sima’an, K., Forcada, M.L., Grasmick, D., Depraetere, H., Way, A. (eds.) Proceedings of the XIV Machine Translation Summit (Nice, September 2–6, 2013), p. 369–370. c ©2013 The authors. This article is licensed under a Creative Commons 3.0 licence, no derivative works, attribution, CC-BY-ND.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Example-Based Machine Translation to translate DVD Subtitles

Audiovisual Translation (AVT), and in particular subtitling, has been recognised as an area that could potentially benefit from the introduction of machine translation (followed by post-editing) [1],[2],[10],[15],[27],[28]. In recent years the demands on subtitlers have increased, while the payment to subtitlers and time allotted to produce the subtitles have both decreased. Therefore this mark...

متن کامل

Pre-reordering for Statistical Machine Translation of Non-fictional Subtitles

This paper describes the challenges of building a Statistical Machine Translation (SMT) system for non-fictional subtitles. Since our experiments focus on a “difficult“ translation direction (i.e. FrenchGerman), we investigate several methods to improve the translation performance. We also compare our in-house SMT systems (including domain adaptation and pre-reordering techniques) to other SMT ...

متن کامل

Using Linguistic Annotations in Statistical Machine Translation of Film Subtitles

Statistical Machine Translation (SMT) has been successfully employed to support translation of film subtitles. We explore the integration of Constraint Grammar corpus annotations into a Swedish–Danish subtitle SMT system in the framework of factored SMT. While the usefulness of the annotations is limited with large amounts of parallel data, we show that linguistic annotations can increase the g...

متن کامل

Cross-lingual Sentence Compression for Subtitles

We present an approach for translating subtitles where standard time and space constraints are modeled as part of the generation of translations in a phrase-based statistical machine translation system (PBSMT). We propose and experiment with two promising strategies for jointly translating and compressing subtitles from English into Portuguese. The quality of the automatic translations is measu...

متن کامل

Codification of Nonverbal Elements in Subtitled Texts: A Case Study of the Persian Subtitles of English Movies

Codification of nonverbal elements in subtitling movies is a challenge for translators. The aim of this study was to investigate the strategies used by Iranian subtitlers for codification of nonverbal elements in subti- tling English movies into Persian using Perego’s shifts and strategies (2003). For this purpose, a selection of 20 English movies (ST) with their...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

SMT Approaches for Commercial Translation of Subtitles

نویسندگان

چکیده

منابع مشابه

Using Example-Based Machine Translation to translate DVD Subtitles

Pre-reordering for Statistical Machine Translation of Non-fictional Subtitles

Using Linguistic Annotations in Statistical Machine Translation of Film Subtitles

Cross-lingual Sentence Compression for Subtitles

Codification of Nonverbal Elements in Subtitled Texts: A Case Study of the Persian Subtitles of English Movies

عنوان ژورنال:

اشتراک گذاری